SAM 3 is the third - generation promptable segmentation foundation model launched by Meta. It can use text or visual prompts (points, boxes, masks) to detect, segment, and track objects in images and videos. Compared with previous generations, SAM 3 introduces the ability to exhaustively segment all instances of open - vocabulary concepts, supports a large number of open - vocabulary prompts, and achieves 75 - 80% of human performance on the SA - CO benchmark.
Computer Vision
TransformersEnglish